AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multilingual visual understanding

# Multilingual visual understanding

Qwen2.5 VL 72B Instruct GGUF
Other
A multimodal large model launched by Tongyi Qianwen, supporting image and text generation and 128k long context processing, with multilingual capabilities.
Image-to-Text English
Q
lmstudio-community
668
1
Aya Vision 32b
Aya Vision 32B is an open-weight 32B parameter multimodal model developed by Cohere Labs, supporting vision-language tasks in 23 languages.
Image-to-Text Transformers Supports Multiple Languages
A
CohereLabs
387
193
Aya Vision 8b
Aya Vision 8B is an open-weight 8-billion-parameter multilingual vision-language model supporting visual and language tasks in 23 languages.
Image-to-Text Transformers Supports Multiple Languages
A
CohereLabs
29.94k
282
Llama 3.2 11B Vision Instruct Abliterated 8 Bit
This is a multimodal model based on Llama-3.2-11B-Vision-Instruct, which supports image and text input and generates text output.
Image-to-Text Transformers Supports Multiple Languages
L
mlx-community
128
0
Pix2struct Screen2words Base
Apache-2.0
Pix2Struct is a vision-language understanding model optimized for generating functional description captions from UI interface screenshots
Image-to-Text Transformers Supports Multiple Languages
P
google
262
24
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase